Baseball was the first major sport to embrace analytics. The game
lends itself to recording data. The game has a small number of possible
states, possible outcomes at any given moment.
One could do a term project on the development of analytics in
sports, from Bill James, to Moneyball, to the present statcast
era.
Rules Overview
For those unfamiliar with the game.
- Nine players per side (with substitutions available)
- Nine innings per game. Each inning has two halves.
- In each half inning one team bats, the other plays the field.
- The basic unit of action in baseball is the pitch:
- the pitcher pitches a ball over home plate to be received
by the catcher.
- the batter tries to hit the pitch into play.
- If the ball is hit into play, the batter tries to advance to 1st
base (or beyond!).
- Possible outcomes of any one pitch:
- ball
- hit by pitch
- swinging strike
- called strike
- foul ball
- hit into play

- The hitting team scores a run when a player advances around the
bases to return to home plate.
- The team with the most runs after 9 innings wins. (No ties, so
‘extra’ innings if necessary.)
- The scoreboard:
| visitor |
0 |
1 |
2 |
0 |
0 |
1 |
0 |
0 |
0 |
4 |
13 |
1 |
| home |
2 |
1 |
0 |
0 |
3 |
0 |
1 |
1 |
x |
8 |
9 |
0 |
- The 9 defensive positions of the fielding team:

source:
whatisbaseball.com/wp-content/uploads/2019/04/Diagram-of-BB.png
Possible states
Baseball has a small number (24) of possible game states at the start of
each plate appearance:
|
outs
|
3rd base
|
2nd base
|
1st base
|
|
0
|
0
|
0
|
0
|
|
0
|
0
|
0
|
1
|
|
0
|
0
|
1
|
0
|
|
0
|
0
|
1
|
1
|
|
0
|
1
|
0
|
0
|
|
0
|
1
|
0
|
1
|
|
0
|
1
|
1
|
0
|
|
0
|
1
|
1
|
1
|
|
1
|
0
|
0
|
0
|
|
1
|
0
|
0
|
1
|
|
1
|
0
|
1
|
0
|
|
1
|
0
|
1
|
1
|
|
1
|
1
|
0
|
0
|
|
1
|
1
|
0
|
1
|
|
1
|
1
|
1
|
0
|
|
1
|
1
|
1
|
1
|
|
2
|
0
|
0
|
0
|
|
2
|
0
|
0
|
1
|
|
2
|
0
|
1
|
0
|
|
2
|
0
|
1
|
1
|
|
2
|
1
|
0
|
0
|
|
2
|
1
|
0
|
1
|
|
2
|
1
|
1
|
0
|
|
2
|
1
|
1
|
1
|
A quick history of keeping statistics
Retrosheet - a nice site with historical baseball statistics,
including game by game results. - basic
box score at retrosheet
Historically important statistics
Batting
- Hits (H)
- Home runs (HR)
- Runs batted in (RBI)
- Batting average (AVG). Batting average is defined as the ratio H/AB,
where
- H = hits, defined as the sum of singles, doubles, triples,
and home runs; and
- AB is ‘at bats’, defined as \[\text{AB = PA - (BB + HBP + SF + SH +
CI)}\]
The hard to achieve hitting triple
crown
Other important offensive stats, historically:
- Runs scored (R)
- Stolen bases (SB)
- Slugging percentage (SLG), \[ SLG =
\frac{H + 2B + 2*3B + 3*HR}{AB}\]
Pitching - Wins (W) - Strikeouts (K) - Earned Run
Average (ERA), which is defined as \[ERA =
\frac{ER}{IP}*9,\] where ER represents the number of
earned runs allowed by the pitcher, and IP represents the
number of innings pitched. - WHIP (walks + hits per inning pitched)
\[WHIP = \frac{BB + H}{IP}\]
- These statistics were (are?) of central importance in evaluating the
greatness of players (MVP awards and admission to the Hall of
Fame).
- But are these statistics the best measure of player performance and,
more importantly, team performance?

